Inculcating Context for Emoji Powered Bengali Hate Speech Detection using Extended Fuzzy SVM and Text Embedding Models

نویسندگان

چکیده

The massive growth of social webs offer opportunities to communicate with diverse languages, unstructured text, informal posts, misspelled contents and emojis. Social media users feel comfortable express their emotions specially high intensity (hate speech) in mother tongue. Hate speech any form targets groups individuals that may trigger antisocial activities, hate crimes, terrorist acts. Bengali use for posting implicit or indirect text. Existing detection research considers explicit but actual is expressed more way. In order detect both from low resource content, need highly efficient automated tools. Researchers applied discriminative learning approaches (i.e. SVM, MLP, CNN) distinguish text only clear-cut outcomes detecting direct speech. proposed novel model two parallel approaches: (i) It applies extended fuzzy SVM classifier class imbalanced dataset (FSVMCIL) multilingual BERT (mBERT) embedding first label; (ii) Morphological analysis method content the similarity (HS) scheme second label. Linking labeling methods, this extracts contextual This HS Word2Vec word lexicon. also emoji conversion analysis. study conducts extensive experiments various categories dataset. evaluates performance considering weighted F1 score, precision, recall accuracy parameters. Results reveal significant improvement 2.35% increase F1- score 9.11 % accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Online Hate Speech Using Context Aware Models

In the wake of a polarizing election, the cyber world is laden with hate speech. Context accompanying a hate speech text is useful for identifying hate speech, which however has been largely overlooked in existing datasets and hate speech detection models. In this paper, we provide an annotated corpus of hate speech with context information well kept. Then we propose two types of hate speech de...

متن کامل

Hate Me, Hate Me Not: Hate Speech Detection on Facebook

While favouring communications and easing information sharing, Social Network Sites are also used to launch harmful campaigns against specific groups and individuals. Cyberbullism, incitement to self-harm practices, sexual predation are just some of the severe effects of massive online offensives. Moreover, attacks can be carried out against groups of victims and can degenerate in physical viol...

متن کامل

Understanding Emoji Ambiguity in Context: The Role of Text in Emoji-Related Miscommunication

Recent studies have found that people interpret emoji characters inconsistently, creating significant potential for miscommunication. However, this research examined emoji in isolation, without consideration of any surrounding text. Prior work has hypothesized that examining emoji in their natural textual contexts would substantially reduce the observed potential for miscommunication. To invest...

متن کامل

A Survey on Hate Speech Detection using Natural Language Processing

This paper presents a survey on hate speech detection. Given the steadily growing body of social media content, the amount of online hate speech is also increasing. Due to the massive scale of the web, methods that automatically detect hate speech are required. Our survey describes key areas that have been explored to automatically recognize these types of utterances using natural language proc...

متن کامل

Context Selection for Embedding Models

Word embeddings are an effective tool to analyze language. They have been recently extended to model other types of data beyond text, such as items in recommendation systems. Embedding models consider the probability of a target observation (a word or an item) conditioned on the elements in the context (other words or items). In this paper, we show that conditioning on all the elements in the c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing

سال: 2023

ISSN: ['2375-4699', '2375-4702']

DOI: https://doi.org/10.1145/3589001